nlp_architect.data.ptb.PTBDataLoader

class nlp_architect.data.ptb.PTBDataLoader(word_dict, seq_len=100, data_dir='/Users/pizsak/data', dataset='WikiText-103', batch_size=32, skip=30, split_type='train', loop=True)[source]

Class that defines data loader

__init__(word_dict, seq_len=100, data_dir='/Users/pizsak/data', dataset='WikiText-103', batch_size=32, skip=30, split_type='train', loop=True)[source]

Initialize class :param word_dict: PTBDictionary object :param seq_len: int, sequence length of data :param data_dir: str, location of corpus data :param dataset: str, name of corpus :param batch_size: int, batch size :param skip: int, number of words to skip over while generating batches :param split_type: str, train/test/valid :param loop: boolean, whether or not to loop over data when it runs out

Methods

__init__(word_dict[, seq_len, data_dir, …])

Initialize class :param word_dict: PTBDictionary object :param seq_len: int, sequence length of data :param data_dir: str, location of corpus data :param dataset: str, name of corpus :param batch_size: int, batch size :param skip: int, number of words to skip over while generating batches :param split_type: str, train/test/valid :param loop: boolean, whether or not to loop over data when it runs out

decode_line(tokens)

Decode a given line from index to word :param tokens: List of indexes

get_batch()

Get one batch of the data :returns: None

load_series(path)

Load all the data into an array :param path: str, location of the input data file

reset()

Resets the sample count to zero, re-shuffles data :returns: None

decode_line(tokens)[source]

Decode a given line from index to word :param tokens: List of indexes

Returns

str, a sentence

get_batch()[source]

Get one batch of the data :returns: None

load_series(path)[source]

Load all the data into an array :param path: str, location of the input data file

Returns:

reset()[source]

Resets the sample count to zero, re-shuffles data :returns: None